{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72045c8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copyright 2022 NVIDIA Corporation. All Rights Reserved.\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License.\n",
    "# =============================================================================="
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2854a414",
   "metadata": {},
   "source": [
    "<img src=\"https://developer.download.nvidia.com/tesla/notebook_assets/nv_logo_torch_trt_resnet_notebook.png\" style=\"width: 90px; float: right;\">\n",
    "\n",
    "# ResNet C++ Serving Example\n",
    "\n",
    "This example shows how you can load a pretrained ResNet-50 model, convert it to a Torch-TensorRT optimized model (via the Torch-TensorRT Python API), save the model as a torchscript module, and then finally load and serve the model with the PyTorch C++ API. The process can be demonstrated with the below workflow diagram:\n",
    "\n",
    "<img src=\"./images/Torch-TensorRT-CPP-inference.JPG\">\n",
    "\n",
    "The Python conversion part largely follows the [Resnet50-example](./Resnet50-example.ipynb). Here for simplicity, we will only download the model and do the conversion.\n",
    "\n",
    "\n",
    "## Pre-requisite\n",
    "This example should be executed from an NGC PyTorch container. \n",
    "```\n",
    "docker pull nvcr.io/nvidia/pytorch:22.05-py3\n",
    "docker run --rm --net=host -it nvcr.io/nvidia/pytorch:22.05-py3 bash\n",
    "```\n",
    "Though this example was tested with the `22.05` version, you can try and replace `22.05` with a later version of the container. \n",
    "\n",
    "Inside the container, install and start Jupyter-lab with:\n",
    "```\n",
    "apt update && pip install jupyterlab\n",
    "jupyter lab --ip 0.0.0.0 --allow-root\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9be4551e",
   "metadata": {},
   "source": [
    "## 1. Download and optimize the ResNet-50 pretrained model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70918487",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "torch.hub._validate_not_a_forked_repo=lambda a,b,c: True\n",
    "\n",
    "resnet50_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)\n",
    "resnet50_model.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cf7143b",
   "metadata": {},
   "source": [
    "### Torch-TensorRT optimization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31a1ead4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch_tensorrt\n",
    "\n",
    "# The compiled module will have precision as specified by \"op_precision\".\n",
    "# Here, it will have FP32 precision.\n",
    "trt_model_fp32 = torch_tensorrt.compile(resnet50_model, inputs = [torch_tensorrt.Input((128, 3, 224, 224), dtype=torch.float32)],\n",
    "    enabled_precisions = torch.float32, # Run with FP32\n",
    "    workspace_size = 1 << 22\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf82c87b",
   "metadata": {},
   "source": [
    "Next, we save this optimized model for later inference in C++."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db73ef0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "trt_model_fp32.save('trt_model_fp32.ts')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2ea04a6",
   "metadata": {},
   "source": [
    "Similarly, we optimize and save the model with FP16 precision."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45917a58",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The compiled module will have precision as specified by \"op_precision\".\n",
    "# Here, it will have FP16 precision.\n",
    "trt_model_fp16 = torch_tensorrt.compile(resnet50_model, inputs = [torch_tensorrt.Input((128, 3, 224, 224), dtype=torch.half)],\n",
    "    enabled_precisions = {torch.half}, # Run with FP16\n",
    "    workspace_size = 1 << 22\n",
    ")\n",
    "trt_model_fp16.save('trt_model_fp16.ts')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f616524",
   "metadata": {},
   "source": [
    "## 2. Load and serve the model in C++\n",
    "\n",
    "First, we will need to download the PyTorch C++ API dependencies.\n",
    "\n",
    "### Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e925c56",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "mkdir deps\n",
    "cd deps\n",
    "wget https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.11.0%2Bcu113.zip\n",
    "rm -r libtorch\n",
    "unzip libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "309696b6",
   "metadata": {},
   "source": [
    "## Prepare C++ Code for FP32 Inference\n",
    "\n",
    "The below demonstrate a minimum C++ code harness for loading and inference with the FP32 model: \n",
    "- A makefile \n",
    "- The C++ code for loading the model and run inference on a dummy input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e25e53c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file Makefile\n",
    "CXX=g++\n",
    "DEP_DIR=$(PWD)/deps\n",
    "INCLUDE_DIRS=-I$(DEP_DIR)/libtorch/include -I$(DEP_DIR)/torch_tensorrt/include\n",
    "LIB_DIRS=-L$(DEP_DIR)/torch_tensorrt/lib -L$(DEP_DIR)/libtorch/lib \n",
    "LIBS=-Wl,--no-as-needed -ltorchtrt_runtime -Wl,--as-needed -ltorch -ltorch_cuda -ltorch_cpu -ltorch_global_deps -lbackend_with_compiler -lc10 -lc10_cuda\n",
    "SRCS=main.cpp\n",
    "\n",
    "TARGET=torchtrt_runtime_example\n",
    "\n",
    "$(TARGET):\n",
    "\t$(CXX) $(SRCS) $(INCLUDE_DIRS) $(LIB_DIRS) $(LIBS) -o $(TARGET)\n",
    "\n",
    "clean:\n",
    "\t$(RM) $(TARGET)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a50b6af4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file main.cpp\n",
    "#include <iostream>\n",
    "#include <fstream>\n",
    "#include <memory>\n",
    "#include <sstream>\n",
    "#include <vector>\n",
    "#include \"torch/script.h\"\n",
    "\n",
    "int main(int argc, const char* argv[]) {\n",
    "  if (argc < 2) {\n",
    "    std::cerr\n",
    "        << \"usage: samplertapp <path-to-pre-built-trt-ts module>\\n\";\n",
    "    return -1;\n",
    "  }\n",
    "\n",
    "  std::string trt_ts_module_path = argv[1];\n",
    "\n",
    "  torch::jit::Module trt_ts_mod;\n",
    "  try {\n",
    "    // Deserialize the ScriptModule from a file using torch::jit::load().\n",
    "    trt_ts_mod = torch::jit::load(trt_ts_module_path);\n",
    "  } catch (const c10::Error& e) {\n",
    "    std::cerr << \"error loading the model from : \" << trt_ts_module_path << std::endl;\n",
    "    return -1;\n",
    "  }\n",
    "\n",
    "  std::cout << \"Running TRT engine\" << std::endl;\n",
    "  std::vector<torch::jit::IValue> trt_inputs_ivalues;\n",
    "  trt_inputs_ivalues.push_back(at::randint(-5, 5, {128, 3, 224, 224}, {at::kCUDA}).to(torch::kFloat32));\n",
    "  torch::jit::IValue trt_results_ivalues = trt_ts_mod.forward(trt_inputs_ivalues);\n",
    "  std::cout << \"==================TRT outputs================\" << std::endl;\n",
    "  std::cout << trt_results_ivalues << std::endl;\n",
    "  std::cout << \"=============================================\" << std::endl;\n",
    "  std::cout << \"TRT engine execution completed. \" << std::endl;\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d44a9579",
   "metadata": {},
   "source": [
    "We are now ready to compile."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "440b12df",
   "metadata": {},
   "outputs": [],
   "source": [
    "!make clean && make"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83401de5",
   "metadata": {},
   "source": [
    "And finally, run the inference in C++."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "baf4621f",
   "metadata": {},
   "outputs": [],
   "source": [
    "!./torchtrt_runtime_example $PWD/trt_model_fp32.ts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39733e52",
   "metadata": {},
   "source": [
    "### ## Prepare C++ Code for FP16 Inference\n",
    "\n",
    "In a similar fashion, we can carry out inference with the FP16 model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed61bc93",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file main.cpp\n",
    "#include <iostream>\n",
    "#include <fstream>\n",
    "#include <memory>\n",
    "#include <sstream>\n",
    "#include <vector>\n",
    "#include \"torch/script.h\"\n",
    "\n",
    "int main(int argc, const char* argv[]) {\n",
    "  if (argc < 2) {\n",
    "    std::cerr\n",
    "        << \"usage: samplertapp <path-to-pre-built-trt-ts module>\\n\";\n",
    "    return -1;\n",
    "  }\n",
    "\n",
    "  std::string trt_ts_module_path = argv[1];\n",
    "\n",
    "  torch::jit::Module trt_ts_mod;\n",
    "  try {\n",
    "    // Deserialize the ScriptModule from a file using torch::jit::load().\n",
    "    trt_ts_mod = torch::jit::load(trt_ts_module_path);\n",
    "  } catch (const c10::Error& e) {\n",
    "    std::cerr << \"error loading the model from : \" << trt_ts_module_path << std::endl;\n",
    "    return -1;\n",
    "  }\n",
    "\n",
    "  std::cout << \"Running TRT engine\" << std::endl;\n",
    "  std::vector<torch::jit::IValue> trt_inputs_ivalues;\n",
    "  trt_inputs_ivalues.push_back(at::randint(-5, 5, {128, 3, 224, 224}, {at::kCUDA}).to(torch::kFloat16));\n",
    "  torch::jit::IValue trt_results_ivalues = trt_ts_mod.forward(trt_inputs_ivalues);\n",
    "  std::cout << \"==================TRT outputs================\" << std::endl;\n",
    "  std::cout << trt_results_ivalues << std::endl;\n",
    "  std::cout << \"=============================================\" << std::endl;\n",
    "  std::cout << \"TRT engine execution completed. \" << std::endl;\n",
    "}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba3864a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "!make clean && make"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e174199",
   "metadata": {},
   "outputs": [],
   "source": [
    "!./torchtrt_runtime_example $PWD/trt_model_fp16.ts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5d98b4a",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "In this example, we have walked you through a bare-bone example of optimizing a ResNet model with the Torch-TensorRT API, and then carry out inference with the optimized model in C++. Next, try this on your own models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1f946f7",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
